Skip to content

Add mart for OCW resources#2236

Merged
pt2302 merged 3 commits into
mainfrom
pt/ocw_resources_mart
May 21, 2026
Merged

Add mart for OCW resources#2236
pt2302 merged 3 commits into
mainfrom
pt/ocw_resources_mart

Conversation

@pt2302
Copy link
Copy Markdown
Contributor

@pt2302 pt2302 commented May 21, 2026

What are the relevant tickets?

Part of https://github.com/mitodl/hq/issues/9943.

Description (What does it do?)

This PR adds a dimensional-layer-backed mart for OCW resources, replacing the Superset dataset that currently reads int__ocw__resources directly. It adds dim_ocw_resource (one row per course_uuid/resource_uuid, sourced from int__ocw__resources) and marts__ocw_resources (references only the dimensional layer, not int__*). It also surfaces nine fields from the raw resource metadata JSON as scalar columns: resource_license, resource_description, resource_file_type, resource_file_size, resource_ocw_type, resource_audience, resource_level, external_resource_status, external_resource_wayback_url, and drops the raw metadata blob from the dim/mart. A few of these (audience, level, wayback_url) are currently sparse, but are expected to be filled in over time.

How can this be tested?

First, run

uv run dbt run \
  --select +marts__ocw_resources \
  --full-refresh \
  --vars 'schema_suffix: <your name>' \
  --project-dir src/ol_dbt/ \
  --profiles-dir src/ol_dbt/ \
  --target dev_production

filling in <your name> as appropriate.

Then, run

uv run dbt test \
  --select int__ocw__resources dim_ocw_resource marts__ocw_resources \
  --vars 'schema_suffix: <your name>' \
  --project-dir src/ol_dbt/ \
  --profiles-dir src/ol_dbt/ \
  --target dev_production

Finally, smoke-test this in Starburst Galaxy (https://mitol.galaxy.starburst.io/query-editor) by running a query such as

select
    course_number,
    course_title,
    resource_title,
    content_type,
    resource_license,
    resource_file_type,
    external_resource_status,
    external_resource_is_broken
from ol_data_lake_production.ol_warehouse_production_<your name>_mart.marts__ocw_resources
order by course_number
limit 20;

Copilot AI review requested due to automatic review settings May 21, 2026 04:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is intended to introduce a dimensional-layer-backed OCW resources mart (per PR description), and begins surfacing additional OCW resource metadata fields as scalar columns for downstream analysis.

Changes:

  • Added scalar extraction of several resource-level metadata fields (license, description, file_type/size, ocw_type, external link status/wayback URL, audience, level) to int__ocw__resources.
  • Updated intermediate and marts YAML model documentation (including adding a new marts__ocw_resources model entry) and cleaned up course_level description formatting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/ol_dbt/models/marts/ocw/_marts__ocw__models.yml Adds schema/docs for marts__ocw_resources (but currently missing the corresponding model SQL in-repo).
src/ol_dbt/models/intermediate/ocw/int__ocw__resources.sql Extracts additional scalar fields from websitecontent_metadata JSON into dedicated columns.
src/ol_dbt/models/intermediate/ocw/_int_ocw__models.yml Documents the new extracted columns on int__ocw__resources and fixes course_level description wrapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +79
- name: marts__ocw_resources
description: OCW course resources (files, external resources, video, image) for
review and analysis
columns:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in efa4981.

Comment on lines +172 to +173
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 57b582d.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@@ -0,0 +1,45 @@
select
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you meant to leave out the config block here, it will use the project's default materialization (likely view). This should be fine since dim_ocw_resource is already a table, just wanted to note it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was intentional. The config blocks on a few other marts are grants for sensitive certificate/profile data, which this mart does not need.

Copy link
Copy Markdown
Contributor

@quazi-h quazi-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds ran clean for me, code looks good. I posted one comment but it is not a blocker.

@pt2302 pt2302 merged commit b0c0bd8 into main May 21, 2026
9 checks passed
@pt2302 pt2302 deleted the pt/ocw_resources_mart branch May 21, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants